Skip to content

[spark] Support column aliases and comments for Paimon views#7625

Open
YannByron wants to merge 3 commits intoapache:masterfrom
YannByron:spark-view-column-comment
Open

[spark] Support column aliases and comments for Paimon views#7625
YannByron wants to merge 3 commits intoapache:masterfrom
YannByron:spark-view-column-comment

Conversation

@YannByron
Copy link
Copy Markdown
Contributor

Summary

  • Previously, creating a view with column aliases or comments threw UnsupportedOperationException.
  • This change applies column aliases and comments to the view schema before persisting, enabling SQL like CREATE VIEW v (col1 COMMENT 'desc', col2) AS SELECT ....
  • Added applyColumnAliasesAndComments method in CreatePaimonViewExec to handle alias renaming and comment attachment on StructField.

Test plan

  • Added test: Paimon View: create view with column comments
  • Added test: Paimon View: create view with column aliases
  • Added test: Paimon View: create view with column aliases and comments
  • All 13 PaimonViewTest cases passed (Spark 3.4)

🤖 Generated with Claude Code

Previously, creating a view with column aliases or comments threw
UnsupportedOperationException. This change applies column aliases
and comments to the view schema before persisting, enabling SQL like:
CREATE VIEW v (col1 COMMENT 'desc') AS SELECT ...

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for column aliases and per-column comments when creating Paimon-backed Spark views, removing the previous hard failure for these features.

Changes:

  • Apply user-specified view column aliases/comments onto the persisted view schema before createView.
  • Add unit tests covering column comments, aliases, and aliases+comments for view creation.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
paimon-spark/paimon-spark-common/src/main/scala/org/apache/paimon/spark/execution/PaimonViewExec.scala Applies column aliases/comments to the schema passed to SupportView.createView instead of rejecting them.
paimon-spark/paimon-spark-ut/src/test/scala/org/apache/paimon/spark/sql/PaimonViewTestBase.scala Adds test coverage for creating views with column aliases and/or comments and validating via DESC / SHOW CREATE TABLE.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 51 to 54
override protected def run(): Seq[InternalRow] = {
if (columnAliases.nonEmpty || columnComments.nonEmpty || queryColumnNames.nonEmpty) {
throw new UnsupportedOperationException(
"columnAliases, columnComments and queryColumnNames are not supported now")
}
// Apply column aliases and comments to the view schema
val finalSchema = applyColumnAliasesAndComments(viewSchema, columnAliases, columnComments)

Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

queryColumnNames is still accepted by CreatePaimonViewExec but is now silently ignored (previously it triggered an UnsupportedOperationException). Since this rule runs in the parser and bypasses Spark's built-in CREATE VIEW validation/semantics, this can lead to incorrect behavior for Spark versions/paths that populate queryColumnNames (or future compatibility work). Either implement the intended semantics for queryColumnNames or explicitly reject non-empty values with a clear AnalysisException/UnsupportedOperationException so we don't create views with partially applied metadata.

Copilot uses AI. Check for mistakes.
Comment on lines +84 to +88
val fields = schema.fields.zipWithIndex.map {
case (field, index) =>
val newName = if (index < aliases.length) aliases(index) else field.name
val newComment = if (index < comments.length) comments(index) else None

Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applyColumnAliasesAndComments currently applies aliases/comments by index but silently falls back to the original name/comment when the provided lists are shorter than the schema, and silently ignores extra entries when they are longer. Because Paimon rewrites CREATE VIEW in the parser (so Spark's native validation won’t run), this should enforce Spark-like semantics: if the user specified a column list (aliases and/or comments), its length must exactly match schema.fields.length, and aliases should be validated for duplicates/empties to avoid creating ambiguous view schemas.

Copilot uses AI. Check for mistakes.
YannByron and others added 2 commits April 10, 2026 22:13
…; fix CI

- Reject non-empty queryColumnNames with UnsupportedOperationException
- Validate that column aliases length matches schema fields count
- Fix SHOW CREATE TABLE test assertion to be backtick-agnostic across
  Spark versions (Spark 3.2 quotes column names with backticks)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use UnsupportedOperationException instead of AnalysisException for
column aliases length validation, as AnalysisException constructor
signature varies across Spark versions (Spark 4.0 requires errorClass).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants